My ML Project

Authors
Affiliation

Name I, First Name I

Name of the University

Name II, First Name II

Published

April 29, 2024

Abstract

The following machine learning project focuses on…

1 Introduction

  • Overview and Motivation
  • Related Work
  • Research questions

2 TESTING if R works and if Python works

#> [1] "hello"
#> 30.0

3 Data

  • Sources
  • Description
  • Wrangling/cleaning
  • Spotting mistakes and missing data (could be part of EDA too)
  • Listing anomalies and outliers (could be part of EDA too)

3.1 Loading and small cleaning (not complete for now)

3.2 Change the path below

3.3 Loading and small cleaning (not complete for now)

3.4 Histogram of prices

3.5 Histogram of prices for each property type

note : only price between 0 and 500000 so some outliers aren’t here

3.6 Histogram of prices for each year category

note : only price between 0 and 500000 so some outliers aren’t here

3.7 Histogram of prices for each canton

note : only price between 0 and 500000 so some outliers aren’t here

3.8 Histogram of prices for each number of rooms

note : only price between 0 and 500000 so some outliers aren’t here

and the graph below only show apartments with less than 10 rooms (but you can change the code if needed

3.9 Test Regression

#> 
#> Call:
#> lm(formula = price ~ number_of_rooms + canton + property_type + 
#>     year_category, data = properties)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -7013788  -514438  -138948   264464 21628996 
#> 
#> Coefficients:
#>                               Estimate Std. Error t value Pr(>|t|)
#> (Intercept)                    -677158      55739  -12.15  < 2e-16
#> number_of_rooms                 337946       6166   54.81  < 2e-16
#> cantonappenzell-ausser-rhoden  -464945     126861   -3.66  0.00025
#> cantonappenzell-inner-rhoden   -874289     392590   -2.23  0.02596
#> cantonbasel-landschaft         -195701      57943   -3.38  0.00073
#> cantonbasel-stadt               218682     105130    2.08  0.03753
#> cantonbern                     -478376      46221  -10.35  < 2e-16
#> cantonfribourg                 -781416      48366  -16.16  < 2e-16
#> cantongeneva                   2025260      62234   32.54  < 2e-16
#> cantonglarus                   -573694     173301   -3.31  0.00093
#> cantongrisons                    59982      71666    0.84  0.40262
#> cantonjura                     -801519      77323  -10.37  < 2e-16
#> cantonlucerne                  -187979      73261   -2.57  0.01030
#> cantonneuchatel                -353635      65590   -5.39  7.1e-08
#> cantonnidwalden                 991055     244826    4.05  5.2e-05
#> cantonobwalden                  366062     244712    1.50  0.13470
#> cantonschaffhausen             -584997     120601   -4.85  1.2e-06
#> cantonschwyz                     18070     132558    0.14  0.89157
#> cantonsolothurn                -784557      61024  -12.86  < 2e-16
#> cantonst-gallen                -404890      55918   -7.24  4.6e-13
#> cantonthurgau                   -37337      63444   -0.59  0.55620
#> cantonticino                    125913      38499    3.27  0.00108
#> cantonuri                         9578     155772    0.06  0.95097
#> cantonvalais                   -219964      39781   -5.53  3.3e-08
#> cantonvaud                       89914      40258    2.23  0.02553
#> cantonzug                       801241     153896    5.21  1.9e-07
#> cantonzurich                    316099      49688    6.36  2.0e-10
#> property_typeAttic flat         311019      45964    6.77  1.4e-11
#> property_typeBifamiliar house    41841      42939    0.97  0.32986
#> property_typeChalet            1136804      56690   20.05  < 2e-16
#> property_typeDuplex              -5091      56699   -0.09  0.92846
#> property_typeFarm house         237939     118848    2.00  0.04529
#> property_typeLoft               285442     291977    0.98  0.32827
#> property_typeRoof flat            4801      64587    0.07  0.94074
#> property_typeRustic house      -281265     249068   -1.13  0.25880
#> property_typeSingle house       389066      24252   16.04  < 2e-16
#> property_typeTerrace flat        88662      87071    1.02  0.30856
#> property_typeVilla             1278283      38187   33.47  < 2e-16
#> year_category1919-1945           10462      61602    0.17  0.86515
#> year_category1946-1960           76025      57261    1.33  0.18429
#> year_category1961-1970          232055      48444    4.79  1.7e-06
#> year_category1971-1980          210609      43422    4.85  1.2e-06
#> year_category1981-1990          237789      43679    5.44  5.3e-08
#> year_category1991-2000          477554      45385   10.52  < 2e-16
#> year_category2001-2005          519338      55369    9.38  < 2e-16
#> year_category2006-2010          591351      48030   12.31  < 2e-16
#> year_category2011-2015          724194      47219   15.34  < 2e-16
#> year_category2016-2024          641233      36926   17.37  < 2e-16
#>                                  
#> (Intercept)                   ***
#> number_of_rooms               ***
#> cantonappenzell-ausser-rhoden ***
#> cantonappenzell-inner-rhoden  *  
#> cantonbasel-landschaft        ***
#> cantonbasel-stadt             *  
#> cantonbern                    ***
#> cantonfribourg                ***
#> cantongeneva                  ***
#> cantonglarus                  ***
#> cantongrisons                    
#> cantonjura                    ***
#> cantonlucerne                 *  
#> cantonneuchatel               ***
#> cantonnidwalden               ***
#> cantonobwalden                   
#> cantonschaffhausen            ***
#> cantonschwyz                     
#> cantonsolothurn               ***
#> cantonst-gallen               ***
#> cantonthurgau                    
#> cantonticino                  ** 
#> cantonuri                        
#> cantonvalais                  ***
#> cantonvaud                    *  
#> cantonzug                     ***
#> cantonzurich                  ***
#> property_typeAttic flat       ***
#> property_typeBifamiliar house    
#> property_typeChalet           ***
#> property_typeDuplex              
#> property_typeFarm house       *  
#> property_typeLoft                
#> property_typeRoof flat           
#> property_typeRustic house        
#> property_typeSingle house     ***
#> property_typeTerrace flat        
#> property_typeVilla            ***
#> year_category1919-1945           
#> year_category1946-1960           
#> year_category1961-1970        ***
#> year_category1971-1980        ***
#> year_category1981-1990        ***
#> year_category1991-2000        ***
#> year_category2001-2005        ***
#> year_category2006-2010        ***
#> year_category2011-2015        ***
#> year_category2016-2024        ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1240000 on 21363 degrees of freedom
#>   (72 observations deleted due to missingness)
#> Multiple R-squared:  0.323,  Adjusted R-squared:  0.321 
#> F-statistic:  216 on 47 and 21363 DF,  p-value: <2e-16

4 Supervised learning

  • Data splitting (if a training/test set split is enough for the global analysis, at least one CV or bootstrap must be used)
  • Two or more models
  • Two or more scores
  • Tuning of one or more hyperparameters per model
  • Interpretation of the model(s)

5 Unsupervised learning

  • Clustering and/or dimension reduction

6 Conclusion

  • Brief summary of the project
  • Take home message
  • Limitations
  • Future work?